Convex Optimization 2015.11.04

Schur Complement: Consider \(\begin{bmatrix} A & B \\ B^T & C \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix} u \\ v \end{bmatrix}\) with \(A\) invertible, \(C\) symmetric.

Have \(Ax + By = u \implies x = A^{-1} (u - By)\)

also \(B^Tx + Cy = v \implies \begin{align*} v = & B^T A^{-1} (u - By) + Cy \\ = & B^T A^{-1} u + (C - B^T A^{-1}B)y \\ = & B^T A^{-1} u + Sy \end{align*}\)

where \(S := C - B^T A^{-1}B\) is the Schur Complement of \(C\) in the matrix \(X := \begin{bmatrix} A & B \\ B^T & C \end{bmatrix}\)

  • Corollary: If \(X\) is invertible, then \(S\) is invertible.

And \(y = S^{-1}(v - B^T A^{-1} u) = S^{-1}v - S^{-1} B^T A^{-1} u\)

also \(x = A^{-1} (u - B S^{-1} v + B S^{-1} B^T A^{-1} u) = (A^{-1} + A^{-1} B S^{-1} B^T A^{-1}) u - A^{-1} B S^{-1} v\)

Since \(\begin{bmatrix} x \\ y \end{bmatrix} = X^{-1} \begin{bmatrix} u \\ v \end{bmatrix}\) this implies \(X^{-1} = \begin{bmatrix} A^{-1} + A^{-1} B S^{-1} B^T A^{-1} & -A^{-1} B S^{-1} \\ -S^{-1} B^T A^{-1} & S^{-1} \end{bmatrix}\)

Consider \(\inf _u \begin{bmatrix} u^T & v^T \end{bmatrix} \begin{bmatrix} A & B \\ B^T & C \end{bmatrix} \begin{bmatrix} u \\ v \end{bmatrix} = \inf _u u^T A u + 2v^T B u + v^T C v\)

for fixed \(v\), \(A, C\) symmetric

gradient = \(2Au + (2v^T B)^T\), optimum achieved when \(Au = -B^T v\).

Assume \(A^{-1}\) exists, optimum achieved at \(u = -A^{-1} B^T v\)

optimum value of the \(\inf\): \(v^T B A^{-1} A A^{-1} B^T v - 2v^T B A^{-1} B^T v + v^T C v = -v^T B A^{-1} B^T v + v^T C v = v^T(C - B A^{-1} B^T)v = v^T S v\).

  • Lessons: If \(S \not\succeq 0\), then \(X = \begin{bmatrix} A & B \\ B^T & C \end{bmatrix} \not\succeq 0\).

    And \(S \succeq 0 \implies X \succeq 0\).

    Conclusion: \(A \succ 0 \implies (X \succeq 0 \iff S \succeq 0)\)

  • Example: Consider \(f(x, Y) = x^T Y^{-1} x \;\; dom \, f = \mathbb{R}^n \times S_{++}^n\)

    Claim: \(f\) is convex (meaning: \(\forall (x_1, Y_1), (x_2, Y_2) \in dom \, f\) and \(\forall \theta \in [0, 1]\), \(f(\theta x_1 + (1 - \theta) x_2, \theta Y_1 + (1 - \theta) Y_2) \le \theta f(x_1, Y_1) + (1 - \theta) f(x_2, Y_2)\)

  • Proof: Show that \(epi(f)\) is a convex set.

    \[\begin{align*} epi(f) = & \{ (x, Y, t) \in \mathbb{R} \times S_{++}^n \times \mathbb{R} : f(x, Y) \le t \} \\ = & \{ (x, Y, t) : x^T Y^{-1} x \le t \} \\ = & \{ (x, Y, t) : t - x^T Y^{-1} x \ge 0 \} \\ = & \{ (x, Y, t) : \begin{bmatrix} Y & x \\ x^T & t \end{bmatrix} \succeq 0 \} & \text{by Schur Complement} \\ = & S_+^{n + 1} \text{ is convex} \end{align*}\]

    (General version of \(x^2 / y\).)

Jensen's Inequality:

  1. If \(f\) is a convex function then \(f(\theta _1 x_1 + \cdots + \theta _n x_n) \le \theta _1 f(x_1) + \cdots \theta _n f(x_n)\) for all \(x_1, \cdots, x_n \in dom \, f\), \(\theta _1 + \cdots + \theta _n = 1, \theta _1, \cdots, \theta _n \ge 0\)
  2. \(f(E[x]) \le E[f(x)]\) for all row vector \(x\) s.t. \(x \in dom \, f\) with probability 1.

If \(x_1, x_2, \cdots \in C\), then \(\sum_{i = 1}^n \theta _i x_i \in C\) for \(\theta _1 + \theta _ 2 + \cdots = 1\).

Say \(x\) is a row vector s.t. \(x = x_i\) with probability \(\theta _i\).

Let \(y_i = f(x_i)\), \(epi(f) = \{(x, t) : f(x) \le t\}\)

(\((\sum \theta _i x_i, \sum \theta_i y_i) \in epi(f)\) ?)

\(f(\sum \theta _i x_i) \le \sum \theta_i f(x_i)\)

  • Example: Let \(a, b \ge 0\), then \(\sqrt{ab} \ge \frac{a + b}{2}\).

  • Proof: By \(f(x) = x^2\) is convex,

    \[\begin{align*} f(E[x]) = & f(\frac{\sqrt{a} + \sqrt{b}}{2}) \\ = & \frac{a + b + 2\sqrt{ab}}{4} \\ \le & E[f(x)] \\ = & \frac{f(\sqrt{a}) + f(\sqrt{b})}{2} \\ = & \frac{a + b}{2} \end{align*}\]

  • Example: Let \(x_1, \cdots, x_n \gt 0\), and \(\theta _1, \cdots, \theta _n\) "the used",

    then \(x_1^{\theta _1} x_2^{\theta _2} \cdots x_n^{\theta _n} \le \theta _1 x_1 + \theta _2 x_2 + \cdots + \theta _n x_n\)

  • Example: Let \(a, b \gt 0, p, q \gt 1, \frac{1}{p} + \frac{1}{q} = 1, a^{\frac{1}{p}} + b^{\frac{1}{q}} \le \frac{1}{p} a + \frac{1}{q} b\)

Hölder's inequality: \(x^T y \le \lVert x \rVert _p \cdot \lVert y \rVert _q\) for \(\frac{1}{p} + \frac{1}{q} = 1, p, q \gt 1\).

  1. can normalize: \(\lVert x \rVert _p = 1, \lVert y \rVert _q = 1\).
  2. \(x^T y = \sum_{i = 1}^n x_i y_i = \sum_{i = 1}^n (x_i^p)^{\frac{1}{p}} (y_i^q)^{\frac{1}{q}} \le \sum_{i = 1}^n \left( \frac{1}{p} x_i^p + \frac{1}{q} y_i^q \right) = 1 = \lVert x \rVert _p \cdot \lVert y \rVert _q\)

\(a^\theta b^{1 - \theta} \le \theta a + (1 - \theta) b\)

\[\begin{align*} xy \le & \theta x^{\frac{1}{\theta}} + (1 - \theta) y^{\frac{1}{1 - \theta}} \\ = & \int _0^x t^{\frac{1}{\theta} - 1} dt + \int _0^y t^{\frac{1}{1 - \theta} - 1} dt \\ = & \int _0^x t^{\frac{1 - \theta}{\theta}} dt + \int _0^y t^{\frac{\theta}{1 - \theta}} dt \end{align*}\]

Let \(\mathcal{F}\) be some collection of convex functions.

Let \(g(x) = \sup _{f \in \mathcal{F}} f(x)\).

\(g(\theta x + (1 - \theta) y) = \sup _{f \in \mathcal{F}} f(\theta x + (1 - \theta) y) \le \sup _{f \in \mathcal{F}} \theta f(x) + (1 - \theta) f(y)\)

\(g(f(\theta x + (1 - \theta) y)) \le g(\theta f(x) + (1 - \theta) f(y)) \le \theta g(f(x)) + (1 - \theta) g(f(y))\)